19 research outputs found

    Effects of Polarization on Particle-Laden Flows

    Get PDF

    Scaling Laws for Discriminative Speech Recognition Rescoring Models

    Full text link
    Recent studies have found that model performance has a smooth power-law relationship, or scaling laws, with training data and model size, for a wide range of problems. These scaling laws allow one to choose nearly optimal data and model sizes. We study whether this scaling property is also applicable to second-pass rescoring, which is an important component of speech recognition systems. We focus on RescoreBERT as the rescoring model, which uses a pre-trained Transformer-based architecture fined tuned with an ASR discriminative loss. Using such a rescoring model, we show that the word error rate (WER) follows a scaling law for over two orders of magnitude as training data and model size increase. In addition, it is found that a pre-trained model would require less data than a randomly initialized model of the same size, representing effective data transferred from pre-training step. This effective data transferred is found to also follow a scaling law with the data and model size

    Discriminative Speech Recognition Rescoring with Pre-trained Language Models

    Full text link
    Second pass rescoring is a critical component of competitive automatic speech recognition (ASR) systems. Large language models have demonstrated their ability in using pre-trained information for better rescoring of ASR hypothesis. Discriminative training, directly optimizing the minimum word-error-rate (MWER) criterion typically improves rescoring. In this study, we propose and explore several discriminative fine-tuning schemes for pre-trained LMs. We propose two architectures based on different pooling strategies of output embeddings and compare with probability based MWER. We conduct detailed comparisons between pre-trained causal and bidirectional LMs in discriminative settings. Experiments on LibriSpeech demonstrate that all MWER training schemes are beneficial, giving additional gains upto 8.5\% WER. Proposed pooling variants achieve lower latency while retaining most improvements. Finally, our study concludes that bidirectionality is better utilized with discriminative training.Comment: ASRU 202

    Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting

    Full text link
    We explore the ability of large language models (LLMs) to act as speech recognition post-processors that perform rescoring and error correction. Our first focus is on instruction prompting to let LLMs perform these task without fine-tuning, for which we evaluate different prompting schemes, both zero- and few-shot in-context learning, and a novel task activation prompting method that combines causal instructions and demonstration to increase its context windows. Next, we show that rescoring only by in-context learning with frozen LLMs achieves results that are competitive with rescoring by domain-tuned LMs, using a pretrained first-pass recognition system and rescoring output on two out-of-domain tasks (ATIS and WSJ). By combining prompting techniques with fine-tuning we achieve error rates below the N-best oracle level, showcasing the generalization power of the LLMs.Comment: Accepted to IEEE Automatic Speech Recognition and Understanding (ASRU) 2023. 8 pages. 2nd version revised from Sep 29th's versio

    Personalization for BERT-based Discriminative Speech Recognition Rescoring

    Full text link
    Recognition of personalized content remains a challenge in end-to-end speech recognition. We explore three novel approaches that use personalized content in a neural rescoring step to improve recognition: gazetteers, prompting, and a cross-attention based encoder-decoder model. We use internal de-identified en-US data from interactions with a virtual voice assistant supplemented with personalized named entities to compare these approaches. On a test set with personalized named entities, we show that each of these approaches improves word error rate by over 10%, against a neural rescoring baseline. We also show that on this test set, natural language prompts can improve word error rate by 7% without any training and with a marginal loss in generalization. Overall, gazetteers were found to perform the best with a 10% improvement in word error rate (WER), while also improving WER on a general test set by 1%

    Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

    Full text link
    We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction (0.08%) of the pretrained parameters. These inserted matrices are optimized through a discriminative training objective along with a correlation-based regularization loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6.Comment: Accepted to IEEE ASRU 2023. Internal Review Approved. Revised 2nd version with Andreas and Huck. The first version is in Sep 29th. 8 page

    Analysis of gas-particle flows through multi-scale simulations

    No full text
    Multi-scale structures are inherent in gas-solid flows, which render the modeling efforts challenging. On one hand, detailed simulations where the fine structures are resolved and particle properties can be directly specified can account for complex flow behaviors, but they are too computationally expensive to apply for larger systems. On the other hand, coarse-grained simulations demand much less computations but they necessitate constitutive models which are often not readily available for given particle properties. The present study focuses on addressing this issue, as it seeks to provide a general framework through which one can obtain the required constitutive models from detailed simulations. To demonstrate the viability of this general framework in which closures can be proposed for different particle properties, we focus on the van der Waals force of interaction between particles. We start with Computational Fluid Dynamics (CFD) - Discrete Element Method (DEM) simulations where the fine structures are resolved and van der Waals force between particles can be directly specified, and obtain closures for stress and drag that are required for coarse-grained simulations. Specifically, we develop a new cohesion model that appropriately accounts for van der Waals force between particles to be used for CFD-DEM simulations. We then validate this cohesion model and the CFD-DEM approach by showing that it can qualitatively capture experimental results where the addition of small particles to gas fluidization reduces bubble sizes. Based on the DEM and CFD-DEM simulation results, we propose stress models that account for the van der Waals force between particles. Finally, we apply machine learning, specifically neural networks, to obtain a drag model that captures the effects from fine structures and inter-particle cohesion. We show that this novel approach using neural networks, which can be readily applied for other closures other than drag here, can take advantage of the large amount of data generated from simulations, and therefore offer superior modeling performance over traditional approaches
    corecore